Preconditioning for Hessian-Free Optimization
نویسندگان
چکیده
Recently Martens adapted the Hessian-free optimization method for the training of deep neural networks. One key aspect of this approach is that the Hessian is never computed explicitly, instead the Conjugate Gradient(CG) Algorithm is used to compute the new search direction by applying only matrix-vector products of the Hessian with arbitrary vectors. This can be done efficiently using a variant of the backpropagation algorithm. Recent algorithms use diagonal preconditioners to reduce the needed iterations of the CG algorithm. They are used because of their easy calculation and application. Unfortunately in later stages of the optimization these diagonal preconditioners are not as well suited for the inner iteration as they are for the optimization in the earlier stages. This is mostly due to an increased number of elements of the dense Hessian having the same order of magnitude near an optimum. We construct a sparse approximate inverse preconditioner (SPAI) that is used to accelerate the inner iteration especially in the later stages of the optimization. The quality of our preconditioner depends on a predefined sparsity pattern. We apply the knowledge of the pattern of the Gauss-Newton approximation of the Hessian to efficiently construct the needed pattern for our preconditioner which can then be computed efficiently fully in parallel using GPUs. This preconditioner is then applied to a deep auto-encoder test case using different update strategies.
منابع مشابه
Preconditioning the Pressure Tracking in Fluid Dynamics by Shape Hessian Information
Potential flow pressure matching is a classical inverse design aerodynamic problem. The resulting loss of regularity during the optimization poses challenges for shape optimization with normal perturbation of the surface mesh nodes. Smoothness is not enforced by the parameterization but by a proper choice of the scalar product based on the shape Hessian, which is derived in local coordinates fo...
متن کاملDynamic scaling based preconditioning for truncated Newton methods in large scale unconstrained optimization
This paper deals with the preconditioning of truncated Newton methods for the solution of large scale nonlinear unconstrained optimization problems. We focus on preconditioners which can be naturally embedded in the framework of truncated Newton methods, i.e. which can be built without storing the Hessian matrix of the function to be minimized, but only based upon information on the Hessian obt...
متن کاملTowards Matrix-Free AD-Based Preconditioning of KKT Systems in PDE-Constrained Optimization
The presented approach aims at solving an equality constrained, finite-dimensional optimization problem, where the constraints arise from the discretization of some partial differential equation (PDE) on a given space grid. For this purpose, a stationary point of the Lagrangian is computed using Newton’s method, which requires the repeated solution of KKT systems. The proposed algorithm focuses...
متن کاملA preconditioning technique for a class of PDE-constrained optimization problems
We investigate the use of a preconditioning technique for solving linear systems of saddle point type arising from the application of an inexact Gauss–Newton scheme to PDE-constrained optimization problems with a hyperbolic constraint. The preconditioner is of block triangular form and involves diagonal perturbations of the (approximate) Hessian to insure nonsingularity and an approximate Schur...
متن کاملAn Efficient Dimer Method with Preconditioning and Linesearch
The dimer method is a Hessian-free algorithm for computing saddle points. We augment the method with a linesearch mechanism for automatic step size selection as well as preconditioning capabilities. We prove local linear convergence. A series of numerical tests demonstrate significant performance gains.
متن کامل